Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs

113

Tr(·) represents the trace of a matrix. However, the item ^∂^w^t

∂α^t^{of Eq. 4.43 is undeﬁned and}

unsolvable based on the normal backpropagation process. To address this problem, we pro-

pose a decoupled optimization method as follows. In the following, we omit the superscript

·^tand deﬁne ^˜L as

˜L = (^∂^L⁽^α,^w⁾

∂w

)^T/α,

(4.44)

which considers the coupling optimization problem as in Eq. 4.42. Note that R(·) is only

considered when backtracking. Thus, we have

∂L(α, w)

∂w

= Tr[α ^˜L^∂^w

∂α ^]^.

(4.45)

For simplifying the derivation, we rewrite ^˜L as [˜g1, ˜ge, · · · , ˜gE], where each ˜ge is a column

vector. Assuming that wm and αi,j are independent when m ! = j, αi,j denotes a speciﬁc

element in the matrix α, we have

(^∂^w

∂α ⁾^m⁼

⎡

⎢⎢⎢⎢⎢⎣

...

∂wm

∂α1,m

...

∂wm

∂αe,m

...

∂wm

∂αE,m

...

⎤

⎥⎥⎥⎥⎥⎦

E×M

(4.46)

and with rewritten α as a column vector [α1, αe, · · · , αE]^Twith each αe is a row vector, we

have

α ^˜L =

⎡

⎢⎢⎢⎢⎣

α1˜g1

...

α1˜ge

...

α1˜gE

αe˜g1

...

αe˜ge

...

αe˜gE

αE˜g1

...

αE˜ge

...

αE˜gE

⎤

⎥⎥⎥⎥⎦

E×E

(4.47)

Combing Eq. 4.46 and Eq. 4.47, the matrix in the trace item of Eq. 4.44 can be written as

α ^˜L(^∂^w

∂α ⁾^m⁼

⎡

⎢⎢⎢⎢⎢⎢⎣

...

α1

e^′=1 ^˜^g^e^′

∂wm

∂αe′,m

...

αe

e^′=1 ^˜^g^e^′

∂wm

∂αe′,m

...

αE

e^′=1 ^˜^g^e^′

∂wm

∂αe′,m

...

⎤

⎥⎥⎥⎥⎥⎥⎦

E×M

(4.48)

Thus the whole matrix α ^˜L ^w

α ^{is with the size of}^E^×^M^×^M^{. After the above derivation, we}

compute the e-th component of the trace item in Eq. 4.44 as

Tr[α ^˜L(^∂^w

∂α ^)]^e⁼^α^e

m=1

e^′=1

˜ge^′

∂αe′,m

(4.49)

Noting that in the vanilla propagation process, α^t⁺¹= α^t−η1

∂L(α^t)

∂α^t^{, thus combining}

Eq. 4.49 we have

˜α^t⁺¹= α^t⁺¹−η

⎡

⎢⎢⎢⎢⎢⎢⎣

m=1

e^′=1 ^˜^g^e^′

∂wm

∂αe′,m

m=1

e^′=1 ^˜^g^e^′

∂wm

∂αe′,m

m=1

e^′=1 ^˜^g^e^′

∂wm

∂αe′,m

⎤

⎥⎥⎥⎥⎥⎥⎦

⊛

⎡

⎢⎢⎢⎢⎣

α1

αe

αE

⎤

⎥⎥⎥⎥⎦

= α^t⁺¹+ ηψ^t⊛α^t,

(4.50)